Issue #1292: es6 style template string #1316

ghost · 2017-03-06T23:06:40Z

Hi, this is the first implementation. So we can have

let str = {js|你的名字|js};;

let x_1 = "world";;

let x_2 = {js| Bucklescript by 彭博 |js};;

let es6 = {j|hello $x_1,欢迎来到 $(x_2)|j};;

let es62 = {j|$str, 君の名は|j}

let a = "a";;

let b = "b";;

let c = a ^ b;;

let d = (^) a b;;

let c = Js.Json.stringify str;;
let () = Js.log str;;

compiled to

// Generated by BUCKLESCRIPT VERSION 1.5.1+dev, PLEASE EDIT WITH CARE
'use strict';


var str = "你的名字";

var x_1 = "world";

var x_2 = " Bucklescript by 彭博 ";

var es6 = "" + "hello " + JSON.stringify(x_1) + ",欢迎来到 " + JSON.stringify(x_2);

var es62 = "" + JSON.stringify(str) + ", 君の名は";

var a = "a";

var b = "b";

var d = "ab";

var c = JSON.stringify(str);

console.log(str);

exports.str  = str;
exports.x_1  = x_1;
exports.x_2  = x_2;
exports.es6  = es6;
exports.es62 = es62;
exports.a    = a;
exports.b    = b;
exports.d    = d;
exports.c    = c;
/* es6 Not a pure module */

From what I can see here there are a few issues I will be working on to improve this:

Do not use JSON.stringify to convert any object into a string. This should be done by other means, but my JS knowledge is limited here. Any good suggestions?
More unit testing on how we report error for certain location
To use regex in OCaml I added the dependency on the standard library's Str module, do you have an opinion on this? It's named Ext_str.
It depends on a C file, which I also pulled in, but if you use clang to compile it, it will raise some warnings. I will look into removing them later.

Anyway, any other suggestions are very welcomed!

glennsl · 2017-03-07T04:05:02Z

This is awesome @dorafmon. Here's my opinion:

It probably shouldn't do automatic string conversion, it would be more idiomatic and a bit safer not to. But the right way to do it is to either use the String type constructor (available as Js.String.make) or the toString method that is available on every js object (but not currently exposed via FFI I think), though the latter puts you at risk of exceptions if you try to call the method on a null or undefined value.
The use of regexes here seems simple enough that it might not be worth it to pull in an extra dependency. But if you need a regex library, this is a pure OCaml implementation that looks very, very good: https://github.com/ocaml/ocaml-re

bobzhang · 2017-03-07T14:28:29Z

jscomp/ext/ext_str.mli

+(** Compile a regular expression. The syntax for regular expressions
+           is the same as in Gnu Emacs. The special characters are
+           [$^.*+?[]]. The following constructs are recognized:
+    -          [.     ] matches any character except newline


str comes with ocaml core distribution, so you don't need copy here. But I would suggest not using it, since you only need one function which can be replaced with a string function easily

bobzhang · 2017-03-07T14:29:39Z

jscomp/others/js_json.mli

 val test : 'a  -> 'b kind -> bool

 external parse : string -> t = "JSON.parse" [@@bs.val]
+external stringify: 'a -> string = "JSON.stringify" [@@bs.val]


how about adding on function in Js.String

external ofAny : 'a -> t = "String" [@@bs.val]

There's already Js.String.make, which does exactly that

bobzhang · 2017-03-07T14:30:13Z

jscomp/syntax/ast_utf8_string.ml

+let rec print_string_list ss = match ss with
+| [] -> ()
+| (hd::tl) -> let _ = print_endline hd in print_string_list tl
+


List.iter print_endline

bobzhang · 2017-03-07T14:32:18Z

jscomp/test/es6_style_string.ml

+
+let es62 = {j|$str, 君の名は|j}
+
+let a = "a";;


you need add a test case here:

let a = {j| blabla \$(xx) |j} (* should not be interpolated*) let b = {j| blabla \$xxx |j} (* should not be interpolated *)

bobzhang · 2017-03-07T14:35:38Z

jscomp/syntax/ast_utf8_string.ml

+              } in _transform_individual_expression rexp new_loc (new_exp::nl))
+
+let transform_es6_style_template_string s loc =
+  let sub_strs = Ext_str.full_split template_string_regex s


we should have a Ext_string.split, but you need verify it is utf8 first

bobzhang · 2017-03-07T14:36:27Z

jscomp/syntax/ast_utf8_string.ml

+| [] -> prev
+| (e::re) -> 
+  let string_concat_exp:Parsetree.expression = {e with pexp_desc = Parsetree.Pexp_ident {txt = Longident.Lident ("^"); loc = e.pexp_loc}} in
+  let new_string_exp = {e with pexp_desc = Parsetree.Pexp_apply (string_concat_exp, [("", prev); ("", e)])} in


To play safe, qualified it as Pervasives.(^) instead of (^)

bobzhang · 2017-03-07T14:37:40Z

The CI error means you need link str.cmxa (for c stubs), but I think here you don't need regex though

chenglou · 2017-03-08T08:31:57Z

jscomp/others/js_json.ml


 external parse : string -> t = "JSON.parse" [@@bs.val]
 (* TODO: more docs when parse error happens or stringify non-stringfy value *)
+external stringify: 'a -> string = "JSON.stringify" [@@bs.val]


Should this be t or 'a?

I think I don't need this anymore. Will move this change to another PR later.

ghost · 2017-03-08T23:14:42Z

@bobzhang Hi, I will remove the dependency on Ext_str but is there any reason why we cannot have the Str module in the compiler? Thanks!

ghost · 2017-03-08T23:19:09Z

@glennsl Hi thanks I will use Js.string.make then.

ghost · 2017-03-08T23:26:51Z

@bobzhang never mind I realized we probably do not want unnecessary dependencies anyway. I will replace it with a function.

bobzhang · 2017-03-13T13:48:07Z

@dorafmon let me know when it is good for review ; )

ghost · 2017-03-13T13:50:01Z

@bobzhang sure, I am missing out some unit tests, that's why I did not ask for a review. Will work on this and let you know. 😄

ghost · 2017-03-14T01:00:06Z

@bobzhang hi I think it's ready for a review. Thanks!

bobzhang

did you use ocp-indent for indenting, I would suggest have it set up, it is a nice tool

bobzhang · 2017-03-14T13:36:05Z

jscomp/ext/ext_string.ml




+let append s c = s ^ String.make 1 c


the name is confusing. append_char?

bobzhang · 2017-03-14T13:39:03Z

jscomp/ounit_tests/ounit_utf8_test.ml

+        end;
+        __LOC__ >:: begin fun _ -> 
+            let s, new_index = Ast_utf8_string.consume_text "Hello \\$world" 0 in
+            let _ = s =~ "Hello \\$world" in


can we add a few more edge cases?

"\$" "\$x" "\\$x" "\\$" "\\\$" "\\\$x" "\\\\$x"

I am a bit confused here, what is the expected behavior? Note here "Hello \\$world" in fact equals to {j|Hello \$world|j} since the OCaml parser will do some processing and escape the \\$ to \$. Thanks.

bobzhang · 2017-03-14T13:42:24Z

jscomp/syntax/ast_utf8_string.ml

+    if index = String.length s then new_word, String.length s
+    else begin 
+     match s.[index] with
+     | '$' -> if last_char = '\\' then _consume_text s (index+1) '$' (Ext_string.append new_word '$') 


This logic is incorrect, see my edge cases above

bobzhang · 2017-03-14T13:43:05Z

jscomp/syntax/ast_utf8_string.ml

+      | '(' -> (if !with_par = false then (with_par := true; _consume_delim s (index+1) ident) else (None, index))
+      | ')' -> (if !with_par = false then (None, index + 1) else (with_par := false; (Some ident, index+1)))
+      | '$' -> (_consume_delim s (index+1) ident)
+      | c -> if (Char.code c >= Char.code 'a' && Char.code c <= Char.code 'z') || 


ocaml support char range patterns

| 'a' ... 'z' | 'A' .. 'Z' | '0' .. '9'| '_'

bobzhang · 2017-03-14T13:44:39Z

jscomp/syntax/ast_utf8_string.ml

+              } in _transform_individual_expression rexp new_loc (new_exp::nl))
+  | Delim p -> (let new_loc = compute_new_loc loc p in
+                let ident = Parsetree.Pexp_ident { txt = (Longident.Lident p); loc = loc } in
+                let js_to_string = Parsetree.Pexp_ident { txt = 


can you factor out it into a small function?

bobzhang · 2017-03-14T13:44:57Z

jscomp/syntax/ast_utf8_string.ml

+| [] -> prev
+| (e::re) -> 
+  let string_concat_exp:Parsetree.expression = {e with pexp_desc = Parsetree.Pexp_ident 
+    {txt = Longident.Ldot (Longident.Lident ("Pervasives"), "^"); loc = e.pexp_loc}} in


same as above

bobzhang · 2017-03-14T13:48:38Z

jscomp/syntax/ppx_entry.ml

-            Location.raise_errorf ~loc "{j||j} is reserved for future use" 
+          else if Ext_string.equal delim Literals.unescaped_j_delimiter then
+            let starting_loc = {loc with loc_end = loc.loc_start} in
+            let empty_string_concat_exp = {e with pexp_desc = Pexp_constant (Const_string ("", None)); pexp_loc = starting_loc} in


can we get rid of empty_string_concat_exp?

Why do we want to get rid of it? I think it makes the implementation much simpler if we keep it.

bobzhang · 2017-03-14T13:51:41Z

jscomp/syntax/ast_utf8_string.ml

+    if index >= String.length s then List.rev nl 
+    else begin
+      match consume_text s index, consume_delim s index with
+      | ("" , str_index)  , (None   , err_index) -> let new_loc = error_reporting_loc loc index err_index in Location.raise_errorf ~loc:new_loc "Not a valid es6 template string"


instead of raising directly, can we return an optional for easy unit-testing?
The current unit testing depends on compiler-libs, it would be nice that such utility is self-contained

bobzhang · 2017-03-15T13:30:46Z

when we do testing, we just use {||} which will not do any escape so that {|\|} ( which is the same as "\\") reply@reply.github.com At: 03/14/17 18:49:37" data-digest="From: reply@reply.github.com At: 03/14/17 18:49:37" style=""> From: reply@reply.github.com At: 03/14/17 18:49:37 To: bucklescript@noreply.github.com Cc: Hongbo Zhang (BLOOMBERG/ 731 LEX), mention@noreply.github.com Subject: Re: [bloomberg/bucklescript] Issue #1292: es6 style template string (#1316) @dorafmon commented on this pull request. In jscomp/ounit_tests/ounit_utf8_test.ml: > @@ -19,5 +43,81 @@ let suites = __LOC__ >:: begin fun _ -> Ext_utf8.decode_utf8_string "" =~ [] - end + end; + __LOC__ >:: begin fun _ -> + Ext_string.append "Hell" 'o' =~ "Hello" + end; + __LOC__ >:: begin fun _ -> + Ast_utf8_string.consume_text "Hello $world" 0 =~ ("Hello ", 6) + end; + __LOC__ >:: begin fun _ -> + let s, new_index = Ast_utf8_string.consume_text "Hello \\$world" 0 in + let _ = s =~ "Hello \\$world" in I am a bit confused here, what is the expected behavior? Note here "Hello \\$world" in fact equals to {j|Hello \$world|j} since the OCaml parser will do some processing and escape the \\$ to '$'. Thanks. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ghost · 2017-03-15T22:03:41Z

@bobzhang sorry I don't quite get what you mean here, functions like consume_delim/consume_text which are tested in the OUnit unit tests work on the strings we extracted from {j|...|j}. I guess I can add some BS tests in the test folder to directly test the template strings.

bobzhang · 2017-03-15T22:40:13Z

@dorafmon it seems your code does not work on {j|\\$x|j}, yes,it is nice to always use {||} in testing when you are sensitive about escaping rules

ghost · 2017-03-15T22:44:32Z

It will be compiled to var escape0 = "" + "\\$x"; which is correct right? Since \$ is escaped and we should just keep both \.

bobzhang · 2017-03-15T22:51:26Z

it should be "\" + x

ghost · 2017-03-16T01:26:10Z

Just to clarify here, we only need to escape \\ ad \ and \$ as $ right. We don't need to implement the full escaping rules as specified by https://caml.inria.fr/pub/docs/manual-ocaml/lex.html

bobzhang

Note the lexical convention will follow js style, would you take a look at how we do utf8 decoding https://github.com/bloomberg/bucklescript/blob/master/jscomp/syntax/ast_utf8_string.ml#L26

I think you need to that in the first pass, other the location would be wrong, for example
{j|你好$x|j} the offset of x would be 6 if you don't do utf8 decoding

bobzhang · 2017-03-16T23:50:38Z

jscomp/bin/all_ounit_tests.ml

-        else (new_word, index)
-      | c -> _consume_text s (index + 1) c (Ext_string.append new_word c)
+      | '\\' -> (if index + 1 = String.length s then "", index else
+                   match s.[index+1] with


this may get out of bound

bobzhang · 2017-03-16T23:50:54Z

jscomp/bin/all_ounit_tests.ml

-                Char.code c = Char.code '_'
-        then _consume_delim s (index+1) (Ext_string.append ident c)
-        else if !with_par = false then (Some ident, index) else (None, index + 1)
+      | 'a' .. 'z' | 'A' .. 'Z' | '0' .. '9'| '_' ->_consume_delim s (index+1) (Ext_string.append_char ident s.[index])


ghost · 2017-04-01T15:47:52Z

@bobzhang Hi, I rebased my changes to master and have fixed some small issues according to the code review (do a utf8 decoding and OCaml lexical escape first so we get the location correct). As I understand that we know have a different error reporting rather than Location.raise_error? Could you point me to that so I can change my code to use that? Overall this is well tested already, let's fix this and ship it.

bobzhang · 2017-04-01T21:23:43Z

jscomp/bin/bspp.ml

    Format.pp_print_string ppf if_highlight
  else begin
-    fprintf ppf "%a%a %s" print loc print_error_prefix () msg;
-    List.iter (Format.fprintf ppf "@\n@[<2>%a@]" default_error_reporter) sub


hi, would you update your ocaml submodule? I cherry picked a fix from upstream, thanks

bobzhang · 2017-04-01T21:30:53Z

jscomp/syntax/ppx_entry.ml

+            let starting_loc = {loc with loc_end = loc.loc_start} in
+            let empty_string_concat_exp = {e with pexp_desc = Pexp_constant (Const_string ("", None)); pexp_loc = starting_loc} in
+            let exps_list = Ast_utf8_string.transform_es6_style_template_string s starting_loc in
+            Ast_utf8_string.fold_expression_list_with_string_concat empty_string_concat_exp exps_list


can you just export one function in Ast_utf8_string, so it looks simpler here

bobzhang · 2017-04-01T21:32:02Z

jscomp/syntax/ast_utf8_string.ml

@@ -1,5 +1,5 @@
 (* Copyright (C) 2015-2016 Bloomberg Finance L.P.
- * 


can you create an interface file for this module, sorry for my laziness

bobzhang · 2017-04-01T21:34:18Z

jscomp/syntax/ast_utf8_string.ml

+                   | '$' -> _consume_text s (index+2) ' ' (Ext_string.append_char new_word '$')
+                   | c -> _consume_text s (index+1) '\\' (Ext_string.append_char new_word '\\'))
+      | '$' -> (new_word, index)
+      | c -> _consume_text s (index + 1) c (Ext_string.append_char new_word c)


Extend_string.append_char is highly in-efficient. you should record offset, instead of creating new string each time

also there is no utf8 involvement when consuming(decoding)

bobzhang · 2017-04-01T21:40:54Z

jscomp/syntax/ast_utf8_string.ml

+    if index = String.length s then (if !with_par = true then (None, index) else (Some ident, index))
+    else
+      match s.[index] with
+      | '(' -> (if !with_par = false then (with_par := true; _consume_delim s (index+1) ident) else (None, index))


I am unclear about such piece of code, our interpolation is quite simple(no nested interpolation).
it is only $x or $(x)

bobzhang · 2017-04-01T21:42:16Z

jscomp/syntax/ast_utf8_string.ml

+let error_reporting_loc (loc:Location.t) start_index end_index =
+  let new_loc =
+    {loc with loc_start = {loc.loc_start with pos_cnum = loc.loc_start.pos_cnum + start_index};
+              loc_end   = {loc.loc_end   with pos_cnum = loc.loc_start.pos_cnum + end_index }} in new_loc


we should record offset, instead of creating new loc each time, otherwise, it would do lots of allocation

bobzhang reviewed Mar 7, 2017

View reviewed changes

chenglou reviewed Mar 8, 2017

View reviewed changes

bobzhang reviewed Mar 14, 2017

View reviewed changes

bobzhang reviewed Mar 16, 2017

View reviewed changes

dorafmon added 8 commits March 31, 2017 22:48

Issue #1292: first implementation

40809bc

Issue #1292: remove regex dep and added more unit tests

4424b25

Issue #1292: fix a small issue

0e8187a

Issue #1292: added more unit tests

d967afd

Issue #1292: add some brief documentation

8825624

Issue #1292: per review

e928026

Issue #1292: added more unit tests

1aa4a0a

Issue #1292: clean up and utf8 decode first

776015f

bobzhang reviewed Apr 1, 2017

View reviewed changes

ghost closed this Apr 9, 2017

rafayepes mentioned this pull request Jan 22, 2018

Dangerous string interpolation #2461

Closed

This pull request was closed.

		@@ -1,5 +1,5 @@
		(* Copyright (C) 2015-2016 Bloomberg Finance L.P.
		*

Issue #1292: es6 style template string #1316

Issue #1292: es6 style template string #1316

Uh oh!

Conversation

ghost commented Mar 6, 2017

Uh oh!

glennsl commented Mar 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glennsl Mar 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bobzhang commented Mar 7, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost commented Mar 8, 2017

Uh oh!

ghost commented Mar 8, 2017

Uh oh!

ghost commented Mar 8, 2017

Uh oh!

bobzhang commented Mar 13, 2017

Uh oh!

ghost commented Mar 13, 2017

Uh oh!

ghost commented Mar 14, 2017

Uh oh!

bobzhang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost Mar 14, 2017 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bobzhang commented Mar 15, 2017 via email

Uh oh!

ghost commented Mar 15, 2017

Uh oh!

bobzhang commented Mar 15, 2017

Uh oh!

ghost commented Mar 15, 2017

Uh oh!

bobzhang commented Mar 15, 2017

Uh oh!

ghost commented Mar 16, 2017

Uh oh!

glennsl commented Mar 7, 2017 •

edited

Loading

glennsl Mar 7, 2017 •

edited

Loading

ghost Mar 14, 2017 •

edited by ghost

Loading